METR is a research nonprofit which evaluates frontier AI models to help companies and wider society understand AI capabilities and what risks they pose.
Featured Research and Resources
METR researches, develops and runs cutting-edge tests of AI capabilities, including broad autonomous capabilities and the ability of AI systems to accelerate AI R&D. We also study potential AI behavior that threatens the integrity of evaluations and mitigations for such behavior.
Evaluation Reports
We have worked with companies such as Anthropic and OpenAI to conduct preliminary evaluations of the autonomous capabilities of several frontier AI models. We do this both to understand the capabilities of frontier models and to pilot third-party evaluator arrangements. (We do not accept compensation for this work.) We also occasionally evaluate models independently after they are released, without involvement from the model’s developers. Recent public reports resulting from this work are below, with additional discussion in the respective system cards.
Partnerships
We partner with AI developers such as Anthropic and OpenAI to conduct evaluations of the autonomous capabilities of frontier AI models. We do this both to understand the models’ capabilities and to pilot third-party evaluator arrangements.
METR does not accept monetary compensation from model developers for this work, but companies including OpenAI and Anthropic have provided access and free compute credits to support our evaluation research. We often use this access and compute credits to continue evaluating models independently after they are released, without involvement from the model’s developers.
We are also partnering with the AI Security Institute and are part of the NIST AI Safety Institute Consortium.